TPA: Two Phase Approximation for Random Walk with Restart

نویسندگان

  • Minji Yoon
  • Jinhong Jung
  • U. Kang
چکیده

Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact RWR require prohibitive storage sizes and computational costs, and alternative methods which avoid such costs by computing approximate RWR have limited accuracy. In this paper, we propose TPA, a fast, scalable, and highly accurate method for computing approximate RWR on large graphs. TPA exploits two important properties in RWR: 1) nodes close to a seed node are likely to be revisited in following steps due to block-wise structure of many real-world graphs, and 2) RWR scores of nodes which reside far from the seed node are proportional to their PageRank scores. Based on these two properties, TPA divides approximate RWR problem into two subproblems called neighbor approximation and stranger approximation. In the neighbor approximation, TPA estimates RWR scores of nodes close to the seed based on scores of few early steps from the seed. In the stranger approximation, TPA estimates RWR scores for nodes far from the seed using their PageRank. The stranger and neighbor approximations are conducted in the preprocessing phase and the online phase, respectively. Through extensive experiments, we show that TPA requires up to 3.5× less time with up to 40× less memory space than other state-of-the-art methods for the preprocessing phase. In the online phase, TPA computes approximate RWR up to 30× faster than existing methods while maintaining high accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised and Extended Restart in Random Walks for Ranking and Link Prediction in Networks

Given a real-world graph, how can we measure relevance scores for ranking and link prediction? Random walk with restart (RWR) provides an excellent measure for this and has been applied to various applications such as friend recommendation, community detection, anomaly detection, etc. However, RWR suffers from two problems: 1) using the same restart probability for all the nodes limits the expr...

متن کامل

IRWRLDA: improved random walk with restart for lncRNA-disease association prediction

In recent years, accumulating evidences have shown that the dysregulations of lncRNAs are associated with a wide range of human diseases. It is necessary and feasible to analyze known lncRNA-disease associations, predict potential lncRNA-disease associations, and provide the most possible lncRNA-disease pairs for experimental validation. Considering the limitations of traditional Random Walk wi...

متن کامل

Random Walk with Wait and Restart on Document Co-citation Network for Similar Document Search

One of the latest algorithms for computing similarities between nodes in a graph is Random Walk with Restart (RWR). However, on a document co-citation network for similar document search, computing transition probabilities remains difficult. To solve the problem, this paper proposes a Random Walk with Wait and Restart (RWWR) algorithm, which contains a new technique for adjusting the transition...

متن کامل

Restart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation

The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in realworld applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random ...

متن کامل

Second Order Moment Asymptotic Expansions for a Randomly Stopped and Standardized Sum

This paper establishes the first four moment expansions to the order o(a^−1) of S_{t_{a}}^{prime }/sqrt{t_{a}}, where S_{n}^{prime }=sum_{i=1}^{n}Y_{i} is a simple random walk with E(Yi) = 0, and ta is a stopping time given by t_{a}=inf left{ ngeq 1:n+S_{n}+zeta _{n}>aright}‎ where S_{n}=sum_{i=1}^{n}X_{i} is another simple random walk with E(Xi) = 0, and {zeta _{n},ngeq 1} is a sequence of ran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.02574  شماره 

صفحات  -

تاریخ انتشار 2017